A Hardware-Software Integrated Solution for Improved Single-Instruction Multi-Thread Processor Efficiency
نویسنده
چکیده
This thesis proposes using an integrated hardware-software solution for improving Single-Instruction Multiple-Thread branching efficiency. Unlike current SIMT hardware branching architectures, this hardware-software solution allows programmers the ability to fine tune branching behavior for their application or allow the compiler to implement a generic software solution. To support a wide range of SIMT applications with different control flow properties, three branching methods are implemented in hardware with configurable software instructions. The three branching methods are the contemporary Post-Dominator Re-convergence that is currently implemented in SIMT processors, a proposed Hyperthreaded SIMT processor cores for maintaining statically allocated thread warps and a proposed Dynamic Micro-Kernels that modified thread warps during run-time execution. Each of the implemented branching methods have their strengths and weaknesses and result in different performance improvements depending on the application. SIMT hyper-threading turns a single SIMT processor core into multiple virtual processors. These virtual processors run divergent control flow paths in parallel from threads in the same warp. Controlling how the virtual processor cores are created is done using a per-warp stack that is managed through software instructions. Dynamic Micro-Kernels creates new threads at run-time to execute divergent control flow paths instead of using branching instructions. A spawn instruction is used to create threads at run-time and once created are placed into new warps with similar threads follow the same control flow path. This thesis's integrated hardware-software branching architectures are evaluated using different realistic benchmarks with varying control flow divergence. Synthetic benchmarks are also used for evaluation and are designed to test specific branching conditions and isolate common branching behaviors. Each of the hardware implemented branching solutions are tested in isolation using different software algorithms. Algorithms are designed for general purpose use or to target specific types of branching conditions. Results shows improved performance for divergent applications and using different software algorithms will affect performance.
منابع مشابه
Implementing Hardware Multithreading in a VLIW Architecture
Hardware multithreading is a well-known technique to increase the utilization of processor resources. However, most studies have focused on superscalar processor organizations. This paper analyzes which type of hardware multithreading is most suitable for a VLIW architecture and proposes two buffers to increase the efficiency of hardware multithreading. An important goal of our work is that no ...
متن کاملModeling and visualizing networked multi-core embedded software energy consumption
In this report we present a network-level multi-core energy model and a software development process workflow that allows software developers to estimate the energy consumption of multi-core embedded programs. This work focuses on a high performance, cache-less and timing predictable embedded processor architecture, XS1. Prior modelling work is improved to increase accuracy, then extended to be...
متن کاملEnergy Modelling of Software for a Hardware Multi-threaded Embedded Microprocessor
This paper examines a hardware multi-threaded microprocessor and discusses the impact such an architecture has on existing software energy modelling techniques. A framework is constructed for analysing the energy behaviour of the XMOS XS1-L multi-threaded processor and a variation on existing software energy models is proposed, based on analysis of collected energy data. It is shown that by com...
متن کاملSuperscalar Performance in a Multithreaded Microprocessor
Multithreaded processors, having hardware support for the concurrent execution of fine-grained threaded computations, are noted for their latency tolerance and low-cost synchronization. Multithreading is a technique for improving the utilization of processing elements (PEs) in parallel processing systems, thereby reducing cost/performance ratios. With increasing integrated circuit densities it ...
متن کاملSoftware Based MEPG-2 Encoding System with Scalable and Multithreaded Architecture
MPEG-2 video encoders are now available in a variety of forms using both hardware and software based approaches. The software-based approach potentially offers a better picture quality but is computationally quite intensive. MPEG-2 video encoding can be fast processed using parallelism. A number of approaches using parallel machines or networks of workstations have been reported. While these ap...
متن کامل